5 research outputs found

    MaxPart: An Efficient Search-Space Pruning Approach to Vertical Partitioning

    Get PDF
    Vertical partitioning is the process of subdividing the attributes of a relation into groups, creating fragments. It represents an effective way of improving performance in the database systems where a significant percentage of query processing time is spent on the full scans of tables. Most of proposed approaches for vertical partitioning in databases use a pairwise affinity to cluster the attributes of a given relation. The affinity measures the frequency of accessing simultaneously a pair of attributes. The attributes having high affinity are clustered together so as to create fragments containing a maximum of attributes with a strong connectivity. However, such fragments can directly and efficiently be achieved by the use of maximal frequent itemsets. This technique of knowledge engineering reflects better the closeness or affinity when more than two attributes are involved. The partitioning process can be done faster and more accurately with the help of such knowledge discovery technique of data mining. In this paper, an approach based on maximal frequent itemsets to vertical partitioning is proposed to efficiently search for an optimized solution by judiciously pruning the potential search space. Moreover, we propose an analytical cost model to evaluate the produced partitions. Experimental studies show that the cost of the partitioning process can be substantially reduced using only a limited set of potential fragments. They also demonstrate the effectiveness of our approach in partitioning small and large tables

    OPTASSIST: A RELATIONAL DATA WAREHOUSE OPTIMIZATION ADVISOR

    Get PDF
    Data warehouses store large amounts of data usually accessed by complex decision making queries with many selection, join and aggregation operations. To optimize the performance of the data warehouse, the administrator has to make a physical design. During physical designphase, the Data Warehouse Administrator has to select some optimization techniques to speed up queries. He must make many choices as optimization techniques to perform,their selection algorithms, parametersof these algorithms and the attributes and tables used by some of these techniques. We describe in this paper the nature of the difficulties encountered by the administrator during physical design. We subsequently present a tool which helps the administrator to make the right choicesfor optimization. We demonstrate the interactive use of this tool using a relational data warehouse created and populated from the APB-1 Benchmark

    Controlling the Trade-Off between Resource Efficiency and User Satisfaction in NDNs Based on Naïve Bayes Data Classification and Lagrange Method

    Full text link
    [EN] This paper addresses the fundamental problem of the trade-off between resource efficiency and user satisfaction in the limited environments of Named Data Networks (NDNs). The proposed strategy is named RADC (Resource Allocation based Data Classification), which aims at managing such trade-off by controlling the system's fairness index. To this end, a machine learning technique based on Multinomial Naive Bayes is used to classify the received contents. Then, an adaptive resource allocation strategy based on the Lagrange utility function is proposed. To cache the received content, an adequate content placement and a replacement mechanism are enforced. Simulation at the system level shows that this strategy could be a powerful tool for administrators to manage the trade-off between efficiency and user satisfaction.This work is derived from R&D project RTI2018-096384-B-I00, funded by MCIN/AEI/ 10.13039/501100011033 and "ERDF A way of making Europe".Herouala, AT.; Kerrache, CA.; Ziani, B.; Tavares De Araujo Cesariny Calafate, CM.; Lagraa, N.; Tahari, AEK. (2022). Controlling the Trade-Off between Resource Efficiency and User Satisfaction in NDNs Based on Naïve Bayes Data Classification and Lagrange Method. Future Internet. 14(2):1-14. https://doi.org/10.3390/fi1402004811414

    A Constraint-based Mining Approach for Multi-attribute Index Selection

    No full text
    International audienceThe index selection problem (ISP) concerns the selection of an appropriate indexes set to minimize the total cost for a given workload under storage constraint. Since the ISP has been proven to be an NP-hard problem, most studies focus on heuristic algorithms to obtain approximate solutions. The problem becomes more difficult for indexes defined on multiple tables such as bitmap join indexes, since it requires the exploration of a large search space. Studies dealing with the problem of selecting bitmap join indexes mainly focused on proposing pruning solutions of the search space by the means of data mining techniques or heuristic strategies. The main shortcoming of these approaches is that the indexes selection process is performed in two steps. The generation of a large number of indexes is followed by a pruning phase. An alternative is to constrain the input data earlier in the selection process thereby reducing the output size to directly discover indexes that are of interest for the administrator. For example, to select a set of indexes, the administrator may put limits on the number of attributes or the cardinality of the attributes to be included in the indexes configuration he is seeking. In this paper we addressed the bitmap join indexes selection problem using a constraint-based approach. Unlike previous approaches, the selection is performed in one step by introducing constraints in the selection process. The proposed approach is evaluated using APB-1 benchmark

    UML4NoSQL: A Novel Approach for Modeling NoSQL Document-Oriented Databases Based on UML

    No full text
    The adoption of Big Data systems by the companies is relatively new, although the data modeling and system design are ages old. Despite the fact that traditional databases are built on solid foundations, they cannot handle the swift and massive flow of data coming from multiple different sources. Herein, NoSQL databases are an inevitable alternative. However, these systems are schemaless compared to traditional databases. It is important to emphasize that schemaless does not mean no-schema which would mean that NoSQL databases do not need modeling. Hence, there is a need for conceptual models to define the data structure in these databases. This paper sheds a light on the importance of the UML in showing how to store Big Data described through meta-models within NoSQL databases. We propose a novel Big Data modeling methodology for NoSQL databases called UML4NoSQL, which is independent of the target system, and taking into account the four Big Data characteristics: Variety, Volume, Velocity, and Veracity (4 V's). The approach relies on the UML blocks with a data-up technique; it starts with a use-case and the class diagram resulting from the understanding of the data at hand and the definition of the developer's strategies while focusing on the user's needs. To illustrate our approach, we take a case study from health care domain. We show that our approach produces designs that can be implemented on NoSQL document-oriented system with respect to Big Data 4 V's
    corecore